Prosper Loan Exploration by Michelle Petersen

Overview

This report explores the Prosper loans dataset. According to Wikipedia,
Prosper Marketplace is a company in the peer-to-peer lending industry and
operates Prosper.com. Borrowers request personal loans on
Prosper and investors can fund them. Prosper lists, collects and distributes
borrower payments and interest back to the loan investors.

The dataset contains a little over 113,000 loans with 81 variables on each loan. Click this link for list of all the variable definitions:   Prosper Loan Data - Variable Definitions.

Install Libraries

Load the Dataset

Exploration of the Full Dataset

In this section I will explore the full dataset. After initial analysis, I
will choose 10-15 variables to explore further.

Dataset Size by Row and Column:

## [1] 113937     81

Dataset Variables and Types

Numeric Variables:

##  ListingNumber          Term        BorrowerAPR       BorrowerRate   
##  Min.   :      4   Min.   :12.00   Min.   :0.00653   Min.   :0.0000  
##  1st Qu.: 400919   1st Qu.:36.00   1st Qu.:0.15629   1st Qu.:0.1340  
##  Median : 600554   Median :36.00   Median :0.20976   Median :0.1840  
##  Mean   : 627886   Mean   :40.83   Mean   :0.21883   Mean   :0.1928  
##  3rd Qu.: 892634   3rd Qu.:36.00   3rd Qu.:0.28381   3rd Qu.:0.2500  
##  Max.   :1255725   Max.   :60.00   Max.   :0.51229   Max.   :0.4975  
##                                    NA's   :25                        
##   LenderYield      EstimatedEffectiveYield EstimatedLoss  
##  Min.   :-0.0100   Min.   :-0.183          Min.   :0.005  
##  1st Qu.: 0.1242   1st Qu.: 0.116          1st Qu.:0.042  
##  Median : 0.1730   Median : 0.162          Median :0.072  
##  Mean   : 0.1827   Mean   : 0.169          Mean   :0.080  
##  3rd Qu.: 0.2400   3rd Qu.: 0.224          3rd Qu.:0.112  
##  Max.   : 0.4925   Max.   : 0.320          Max.   :0.366  
##                    NA's   :29084           NA's   :29084  
##  EstimatedReturn  ProsperRating..numeric.  ProsperScore  
##  Min.   :-0.183   Min.   :1.000           Min.   : 1.00  
##  1st Qu.: 0.074   1st Qu.:3.000           1st Qu.: 4.00  
##  Median : 0.092   Median :4.000           Median : 6.00  
##  Mean   : 0.096   Mean   :4.072           Mean   : 5.95  
##  3rd Qu.: 0.117   3rd Qu.:5.000           3rd Qu.: 8.00  
##  Max.   : 0.284   Max.   :7.000           Max.   :11.00  
##  NA's   :29084    NA's   :29084           NA's   :29084  
##  ListingCategory..numeric. EmploymentStatusDuration CreditScoreRangeLower
##  Min.   : 0.000            Min.   :  0.00           Min.   :  0.0        
##  1st Qu.: 1.000            1st Qu.: 26.00           1st Qu.:660.0        
##  Median : 1.000            Median : 67.00           Median :680.0        
##  Mean   : 2.774            Mean   : 96.07           Mean   :685.6        
##  3rd Qu.: 3.000            3rd Qu.:137.00           3rd Qu.:720.0        
##  Max.   :20.000            Max.   :755.00           Max.   :880.0        
##                            NA's   :7625             NA's   :591          
##  CreditScoreRangeUpper CurrentCreditLines OpenCreditLines
##  Min.   : 19.0         Min.   : 0.00      Min.   : 0.00  
##  1st Qu.:679.0         1st Qu.: 7.00      1st Qu.: 6.00  
##  Median :699.0         Median :10.00      Median : 9.00  
##  Mean   :704.6         Mean   :10.32      Mean   : 9.26  
##  3rd Qu.:739.0         3rd Qu.:13.00      3rd Qu.:12.00  
##  Max.   :899.0         Max.   :59.00      Max.   :54.00  
##  NA's   :591           NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio StatedMonthlyIncome TotalProsperLoans
##  Min.   : 0.000    Min.   :      0     Min.   :0.00     
##  1st Qu.: 0.140    1st Qu.:   3200     1st Qu.:1.00     
##  Median : 0.220    Median :   4667     Median :1.00     
##  Mean   : 0.276    Mean   :   5608     Mean   :1.42     
##  3rd Qu.: 0.320    3rd Qu.:   6825     3rd Qu.:2.00     
##  Max.   :10.010    Max.   :1750003     Max.   :8.00     
##  NA's   :8554                          NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount MonthlyLoanPayment LP_CustomerPayments
##  Min.   : 1000      Min.   :   0.0     Min.   :   -2.35   
##  1st Qu.: 4000      1st Qu.: 131.6     1st Qu.: 1005.76   
##  Median : 6500      Median : 217.7     Median : 2583.83   
##  Mean   : 8337      Mean   : 272.5     Mean   : 4183.08   
##  3rd Qu.:12000      3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  Max.   :35000      Max.   :2251.5     Max.   :40702.39   
##                                                           
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

Factor Variables:

##                    ListingKey                        ListingCreationDate
##  17A93590655669644DB4C06:     6   2013-10-02 17:20:16.550000000:     6  
##  349D3587495831350F0F648:     4   2013-08-28 20:31:41.107000000:     4  
##  47C1359638497431975670B:     4   2013-09-08 09:27:44.853000000:     4  
##  8474358854651984137201C:     4   2013-12-06 05:43:13.830000000:     4  
##  DE8535960513435199406CE:     4   2013-12-06 11:44:58.283000000:     4  
##  04C13599434217079754AEE:     3   2013-08-21 07:25:22.360000000:     3  
##  (Other)                :113912   (Other)                      :113912  
##   CreditGrade                    LoanStatus                  ClosedDate   
##         :84984   Current              :56576                      :58848  
##  C      : 5649   Completed            :38074   2014-03-04 00:00:00:  105  
##  D      : 5153   Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  B      : 4389   Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  AA     : 3509   Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  HR     : 3508   Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other): 6745   (Other)              : 1108   (Other)            :54633  
##  ProsperRating..Alpha. BorrowerState                      Occupation   
##         :29084         CA     :14717   Other                   :28617  
##  C      :18345         TX     : 6842   Professional            :13628  
##  B      :15581         NY     : 6729   Computer Programmer     : 4478  
##  A      :14551         FL     : 6720   Executive               : 4311  
##  D      :14274         IL     : 5921   Teacher                 : 3759  
##  E      : 9795                : 5515   Administrative Assistant: 3688  
##  (Other):12307         (Other):67493   (Other)                 :55456  
##       EmploymentStatus IsBorrowerHomeowner CurrentlyInGroup
##  Employed     :67322   False:56459         False:101218    
##  Full-time    :26355   True :57478         True : 12719    
##  Self-employed: 6134                                       
##  Not available: 5347                                       
##  Other        : 3806                                       
##               : 2255                                       
##  (Other)      : 2718                                       
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##         FirstRecordedCreditLine         IncomeRange    IncomeVerifiable
##                     :   697     $25,000-49,999:32192   False:  8669    
##  1993-12-01 00:00:00:   185     $50,000-74,999:31050   True :105268    
##  1994-11-01 00:00:00:   178     $100,000+     :17337                   
##  1995-11-01 00:00:00:   168     $75,000-99,999:16916                   
##  1990-04-01 00:00:00:   161     Not displayed : 7741                   
##  1995-03-01 00:00:00:   159     $1-24,999     : 7274                   
##  (Other)            :112389     (Other)       : 1427                   
##                     LoanKey                LoanOriginationDate
##  CB1B37030986463208432A1:     6   2014-01-22 00:00:00:   491  
##  2DEE3698211017519D7333F:     4   2013-11-13 00:00:00:   490  
##  9F4B37043517554537C364C:     4   2014-02-19 00:00:00:   439  
##  D895370150591392337ED6D:     4   2013-10-16 00:00:00:   434  
##  E6FB37073953690388BC56D:     4   2014-01-28 00:00:00:   339  
##  0D8F37036734373301ED419:     3   2013-09-24 00:00:00:   316  
##  (Other)                :113912   (Other)            :111428  
##  LoanOriginationQuarter                   MemberKey     
##  Q4 2013:14450          63CA34120866140639431C9:     9  
##  Q1 2014:12172          16083364744933457E57FB9:     8  
##  Q3 2013: 9180          3A2F3380477699707C81385:     8  
##  Q2 2013: 7099          4D9C3403302047712AD0CDD:     8  
##  Q3 2012: 5632          739C338135235294782AE75:     8  
##  Q2 2012: 5061          7E1733653050264822FAA3D:     8  
##  (Other):60343          (Other)                :113888

Variables Type Count:

## 
## FALSE  TRUE 
##    20    61

Full Dataset Statistical Summary Using stat.desc:

##          ListingKey ListingNumber ListingCreationDate CreditGrade
## nbr.val          NA  1.139370e+05                  NA          NA
## nbr.null         NA  0.000000e+00                  NA          NA
## nbr.na           NA  0.000000e+00                  NA          NA
## min              NA  4.000000e+00                  NA          NA
## max              NA  1.255725e+06                  NA          NA
## range            NA  1.255721e+06                  NA          NA
## sum              NA  7.153941e+10                  NA          NA
## median           NA  6.005540e+05                  NA          NA
## mean             NA  6.278857e+05                  NA          NA
## SE.mean          NA  9.719466e+02                  NA          NA
## CI.mean          NA  1.905000e+03                  NA          NA
## var              NA  1.076340e+11                  NA          NA
## std.dev          NA  3.280762e+05                  NA          NA
## coef.var         NA  5.225095e-01                  NA          NA
##                  Term LoanStatus ClosedDate  BorrowerAPR BorrowerRate
## nbr.val  1.139370e+05         NA         NA 1.139120e+05 1.139370e+05
## nbr.null 0.000000e+00         NA         NA 0.000000e+00 8.000000e+00
## nbr.na   0.000000e+00         NA         NA 2.500000e+01 0.000000e+00
## min      1.200000e+01         NA         NA 6.530000e-03 0.000000e+00
## max      6.000000e+01         NA         NA 5.122900e-01 4.975000e-01
## range    4.800000e+01         NA         NA 5.057600e-01 4.975000e-01
## sum      4.652076e+06         NA         NA 2.492710e+04 2.196296e+04
## median   3.600000e+01         NA         NA 2.097600e-01 1.840000e-01
## mean     4.083025e+01         NA         NA 2.188277e-01 1.927641e-01
## SE.mean  3.091794e-02         NA         NA 2.381098e-04 2.216543e-04
## CI.mean  6.059869e-02         NA         NA 4.666916e-04 4.344391e-04
## var      1.089145e+02         NA         NA 6.458385e-03 5.597798e-03
## std.dev  1.043621e+01         NA         NA 8.036408e-02 7.481843e-02
## coef.var 2.556000e-01         NA         NA 3.672483e-01 3.881348e-01
##            LenderYield EstimatedEffectiveYield EstimatedLoss
## nbr.val   1.139370e+05            8.485300e+04  8.485300e+04
## nbr.null  1.000000e+01            1.000000e+00  0.000000e+00
## nbr.na    0.000000e+00            2.908400e+04  2.908400e+04
## min      -1.000000e-02           -1.827000e-01  4.900000e-03
## max       4.925000e-01            3.199000e-01  3.660000e-01
## range     5.025000e-01            5.026000e-01  3.611000e-01
## sum       2.081640e+04            1.431143e+04  6.814193e+03
## median    1.730000e-01            1.615000e-01  7.240000e-02
## mean      1.827010e-01            1.686615e-01  8.030586e-02
## SE.mean   2.207578e-04            2.350442e-04  1.605364e-04
## CI.mean   4.326819e-04            4.606847e-04  3.146501e-04
## var       5.552605e-03            4.687770e-03  2.186827e-03
## std.dev   7.451580e-02            6.846729e-02  4.676352e-02
## coef.var  4.078566e-01            4.059451e-01  5.823177e-01
##          EstimatedReturn ProsperRating..numeric. ProsperRating..Alpha.
## nbr.val     8.485300e+04            8.485300e+04                    NA
## nbr.null    1.000000e+00            0.000000e+00                    NA
## nbr.na      2.908400e+04            2.908400e+04                    NA
## min        -1.827000e-01            1.000000e+00                    NA
## max         2.837000e-01            7.000000e+00                    NA
## range       4.664000e-01            6.000000e+00                    NA
## sum         8.151683e+03            3.455420e+05                    NA
## median      9.170000e-02            4.000000e+00                    NA
## mean        9.606830e-02            4.072243e+00                    NA
## SE.mean     1.043721e-04            5.744090e-03                    NA
## CI.mean     2.045685e-04            1.125837e-02                    NA
## var         9.243489e-04            2.799688e+00                    NA
## std.dev     3.040311e-02            1.673227e+00                    NA
## coef.var    3.164739e-01            4.108859e-01                    NA
##          ProsperScore ListingCategory..numeric. BorrowerState Occupation
## nbr.val  8.485300e+04              1.139370e+05            NA         NA
## nbr.null 0.000000e+00              1.696500e+04            NA         NA
## nbr.na   2.908400e+04              0.000000e+00            NA         NA
## min      1.000000e+00              0.000000e+00            NA         NA
## max      1.100000e+01              2.000000e+01            NA         NA
## range    1.000000e+01              2.000000e+01            NA         NA
## sum      5.048810e+05              3.160850e+05            NA         NA
## median   6.000000e+00              1.000000e+00            NA         NA
## mean     5.950067e+00              2.774209e+00            NA         NA
## SE.mean  8.158388e-03              1.184076e-02            NA         NA
## CI.mean  1.599038e-02              2.320771e-02            NA         NA
## var      5.647756e+00              1.597438e+01            NA         NA
## std.dev  2.376501e+00              3.996797e+00            NA         NA
## coef.var 3.994074e-01              1.440698e+00            NA         NA
##          EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
## nbr.val                NA             1.063120e+05                  NA
## nbr.null               NA             1.534000e+03                  NA
## nbr.na                 NA             7.625000e+03                  NA
## min                    NA             0.000000e+00                  NA
## max                    NA             7.550000e+02                  NA
## range                  NA             7.550000e+02                  NA
## sum                    NA             1.021356e+07                  NA
## median                 NA             6.700000e+01                  NA
## mean                   NA             9.607158e+01                  NA
## SE.mean                NA             2.897687e-01                  NA
## CI.mean                NA             5.679427e-01                  NA
## var                    NA             8.926585e+03                  NA
## std.dev                NA             9.448061e+01                  NA
## coef.var               NA             9.834397e-01                  NA
##          CurrentlyInGroup GroupKey DateCreditPulled CreditScoreRangeLower
## nbr.val                NA       NA               NA          1.133460e+05
## nbr.null               NA       NA               NA          1.330000e+02
## nbr.na                 NA       NA               NA          5.910000e+02
## min                    NA       NA               NA          0.000000e+00
## max                    NA       NA               NA          8.800000e+02
## range                  NA       NA               NA          8.800000e+02
## sum                    NA       NA               NA          7.770636e+07
## median                 NA       NA               NA          6.800000e+02
## mean                   NA       NA               NA          6.855677e+02
## SE.mean                NA       NA               NA          1.973995e-01
## CI.mean                NA       NA               NA          3.869000e-01
## var                    NA       NA               NA          4.416702e+03
## std.dev                NA       NA               NA          6.645827e+01
## coef.var               NA       NA               NA          9.693904e-02
##          CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
## nbr.val           1.133460e+05                      NA       1.063330e+05
## nbr.null          0.000000e+00                      NA       3.850000e+02
## nbr.na            5.910000e+02                      NA       7.604000e+03
## min               1.900000e+01                      NA       0.000000e+00
## max               8.990000e+02                      NA       5.900000e+01
## range             8.800000e+02                      NA       5.900000e+01
## sum               7.985993e+07                      NA       1.097058e+06
## median            6.990000e+02                      NA       1.000000e+01
## mean              7.045677e+02                      NA       1.031719e+01
## SE.mean           1.973995e-01                      NA       1.673743e-02
## CI.mean           3.869000e-01                      NA       3.280514e-02
## var               4.416702e+03                      NA       2.978830e+01
## std.dev           6.645827e+01                      NA       5.457866e+00
## coef.var          9.432489e-02                      NA       5.290069e-01
##          OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts
## nbr.val     1.063330e+05               1.132400e+05          1.139370e+05
## nbr.null    5.620000e+02               0.000000e+00          3.506000e+03
## nbr.na      7.604000e+03               6.970000e+02          0.000000e+00
## min         0.000000e+00               2.000000e+00          0.000000e+00
## max         5.400000e+01               1.360000e+02          5.100000e+01
## range       5.400000e+01               1.340000e+02          5.100000e+01
## sum         9.846610e+05               3.029684e+06          7.941170e+05
## median      9.000000e+00               2.500000e+01          6.000000e+00
## mean        9.260164e+00               2.675454e+01          6.969790e+00
## SE.mean     1.540275e-02               4.052720e-02          1.371954e-02
## CI.mean     3.018919e-02               7.943271e-02          2.689009e-02
## var         2.522696e+01               1.859915e+02          2.144588e+01
## std.dev     5.022644e+00               1.363787e+01          4.630970e+00
## coef.var    5.423926e-01               5.097404e-01          6.644346e-01
##          OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## nbr.val                 1.139370e+05         1.132400e+05   1.127780e+05
## nbr.null                5.227000e+03         5.000500e+04   8.430000e+03
## nbr.na                  0.000000e+00         6.970000e+02   1.159000e+03
## min                     0.000000e+00         0.000000e+00   0.000000e+00
## max                     1.498500e+04         1.050000e+02   3.790000e+02
## range                   1.498500e+04         1.050000e+02   3.790000e+02
## sum                     4.538021e+07         1.625090e+05   6.297980e+05
## median                  2.710000e+02         1.000000e+00   4.000000e+00
## mean                    3.982922e+02         1.435085e+00   5.584405e+00
## SE.mean                 1.324739e+00         7.243458e-03   1.914675e-02
## CI.mean                 2.596468e+00         1.419707e-02   3.752735e-02
## var                     1.999518e+05         5.941441e+00   4.134420e+01
## std.dev                 4.471597e+02         2.437507e+00   6.429946e+00
## coef.var                1.122693e+00         1.698511e+00   1.151411e+00
##          CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## nbr.val          1.132400e+05     1.063150e+05            1.129470e+05
## nbr.null         8.974200e+04     8.981800e+04            7.643900e+04
## nbr.na           6.970000e+02     7.622000e+03            9.900000e+02
## min              0.000000e+00     0.000000e+00            0.000000e+00
## max              8.300000e+01     4.638810e+05            9.900000e+01
## range            8.300000e+01     4.638810e+05            9.900000e+01
## sum              6.704400e+04     1.046679e+08            4.692930e+05
## median           0.000000e+00     0.000000e+00            0.000000e+00
## mean             5.920523e-01     9.845071e+02            4.154984e+00
## SE.mean          5.880057e-03     2.195386e+01            3.023191e-02
## CI.mean          1.152482e-02     4.302926e+01            5.925409e-02
## var              3.915280e+00     5.124083e+07            1.032300e+02
## std.dev          1.978707e+00     7.158270e+03            1.016022e+01
## coef.var         3.342115e+00     7.270918e+00            2.445308e+00
##          PublicRecordsLast10Years PublicRecordsLast12Months
## nbr.val              1.132400e+05              1.063330e+05
## nbr.null             8.580300e+04              1.049410e+05
## nbr.na               6.970000e+02              7.604000e+03
## min                  0.000000e+00              0.000000e+00
## max                  3.800000e+01              2.000000e+01
## range                3.800000e+01              2.000000e+01
## sum                  3.540400e+04              1.605000e+03
## median               0.000000e+00              0.000000e+00
## mean                 3.126457e-01              1.509409e-02
## SE.mean              2.162981e-03              4.725472e-04
## CI.mean              4.239410e-03              9.261861e-04
## var                  5.297918e-01              2.374425e-02
## std.dev              7.278680e-01              1.540917e-01
## coef.var             2.328092e+00              1.020874e+01
##          RevolvingCreditBalance BankcardUtilization
## nbr.val            1.063330e+05        1.063330e+05
## nbr.null           4.059000e+03        6.782000e+03
## nbr.na             7.604000e+03        7.604000e+03
## min                0.000000e+00        0.000000e+00
## max                1.435667e+06        5.950000e+00
## range              1.435667e+06        5.950000e+00
## sum                1.871323e+09        5.968563e+04
## median             8.549000e+03        6.000000e-01
## mean               1.759871e+04        5.613086e-01
## SE.mean            1.010048e+02        9.749470e-04
## CI.mean            1.979681e+02        1.910883e-03
## var                1.084807e+09        1.010718e-01
## std.dev            3.293640e+04        3.179179e-01
## coef.var           1.871524e+00        5.663871e-01
##          AvailableBankcardCredit  TotalTrades
## nbr.val             1.063930e+05 1.063930e+05
## nbr.null            4.881000e+03 4.000000e+00
## nbr.na              7.544000e+03 7.544000e+03
## min                 0.000000e+00 0.000000e+00
## max                 6.462850e+05 1.260000e+02
## range               6.462850e+05 1.260000e+02
## sum                 1.192690e+09 2.471513e+06
## median              4.100000e+03 2.200000e+01
## mean                1.121023e+04 2.323003e+01
## SE.mean             6.075908e+01 3.639504e-02
## CI.mean             1.190870e+02 7.133377e-02
## var                 3.927674e+08 1.409280e+02
## std.dev             1.981836e+04 1.187131e+01
## coef.var            1.767882e+00 5.110329e-01
##          TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## nbr.val                        1.063930e+05            1.063930e+05
## nbr.null                       5.100000e+01            5.424900e+04
## nbr.na                         7.544000e+03            7.544000e+03
## min                            0.000000e+00            0.000000e+00
## max                            1.000000e+00            2.000000e+01
## range                          1.000000e+00            2.000000e+01
## sum                            9.425326e+04            8.536200e+04
## median                         9.400000e-01            0.000000e+00
## mean                           8.858972e-01            8.023272e-01
## SE.mean                        4.542873e-04            3.365132e-03
## CI.mean                        8.903968e-04            6.595612e-03
## var                            2.195706e-02            1.204806e+00
## std.dev                        1.481791e-01            1.097637e+00
## coef.var                       1.672645e-01            1.368066e+00
##          DebtToIncomeRatio IncomeRange IncomeVerifiable
## nbr.val       1.053830e+05          NA               NA
## nbr.null      1.900000e+01          NA               NA
## nbr.na        8.554000e+03          NA               NA
## min           0.000000e+00          NA               NA
## max           1.001000e+01          NA               NA
## range         1.001000e+01          NA               NA
## sum           2.908008e+04          NA               NA
## median        2.200000e-01          NA               NA
## mean          2.759466e-01          NA               NA
## SE.mean       1.699668e-03          NA               NA
## CI.mean       3.331326e-03          NA               NA
## var           3.044379e-01          NA               NA
## std.dev       5.517589e-01          NA               NA
## coef.var      1.999513e+00          NA               NA
##          StatedMonthlyIncome LoanKey TotalProsperLoans
## nbr.val         1.139370e+05      NA      2.208500e+04
## nbr.null        1.394000e+03      NA      1.000000e+00
## nbr.na          0.000000e+00      NA      9.185200e+04
## min             0.000000e+00      NA      0.000000e+00
## max             1.750003e+06      NA      8.000000e+00
## range           1.750003e+06      NA      8.000000e+00
## sum             6.389616e+08      NA      3.138500e+04
## median          4.666667e+03      NA      1.000000e+00
## mean            5.608026e+03      NA      1.421100e+00
## SE.mean         2.215552e+01      NA      5.141249e-03
## CI.mean         4.342448e+01      NA      1.007722e-02
## var             5.592792e+07      NA      5.837605e-01
## std.dev         7.478497e+03      NA      7.640422e-01
## coef.var        1.333535e+00      NA      5.376413e-01
##          TotalProsperPaymentsBilled OnTimeProsperPayments
## nbr.val                2.208500e+04          2.208500e+04
## nbr.null               6.500000e+01          7.500000e+01
## nbr.na                 9.185200e+04          9.185200e+04
## min                    0.000000e+00          0.000000e+00
## max                    1.410000e+02          1.410000e+02
## range                  1.410000e+02          1.410000e+02
## sum                    5.065050e+05          4.918760e+05
## median                 1.600000e+01          1.500000e+01
## mean                   2.293434e+01          2.227195e+01
## SE.mean                1.295307e-01          1.267102e-01
## CI.mean                2.538894e-01          2.483609e-01
## var                    3.705465e+02          3.545849e+02
## std.dev                1.924958e+01          1.883042e+01
## coef.var               8.393344e-01          8.454772e-01
##          ProsperPaymentsLessThanOneMonthLate
## nbr.val                         2.208500e+04
## nbr.null                        1.828500e+04
## nbr.na                          9.185200e+04
## min                             0.000000e+00
## max                             4.200000e+01
## range                           4.200000e+01
## sum                             1.355200e+04
## median                          0.000000e+00
## mean                            6.136292e-01
## SE.mean                         1.646473e-02
## CI.mean                         3.227205e-02
## var                             5.986963e+00
## std.dev                         2.446827e+00
## coef.var                        3.987469e+00
##          ProsperPaymentsOneMonthPlusLate ProsperPrincipalBorrowed
## nbr.val                     2.208500e+04             2.208500e+04
## nbr.null                    2.170000e+04             1.000000e+00
## nbr.na                      9.185200e+04             9.185200e+04
## min                         0.000000e+00             0.000000e+00
## max                         2.100000e+01             7.249900e+04
## range                       2.100000e+01             7.249900e+04
## sum                         1.072000e+03             1.871110e+08
## median                      0.000000e+00             6.000000e+03
## mean                        4.853973e-02             8.472312e+03
## SE.mean                     3.743250e-03             4.976446e+01
## CI.mean                     7.337037e-03             9.754189e+01
## var                         3.094532e-01             5.469353e+07
## std.dev                     5.562852e-01             7.395508e+03
## coef.var                    1.146041e+01             8.729031e-01
##          ProsperPrincipalOutstanding ScorexChangeAtTimeOfListing
## nbr.val                 2.208500e+04                1.892800e+04
## nbr.null                5.943000e+03                1.127000e+03
## nbr.na                  9.185200e+04                9.500900e+04
## min                     0.000000e+00               -2.090000e+02
## max                     2.345095e+04                2.860000e+02
## range                   2.345095e+04                4.950000e+02
## sum                     6.471598e+07               -6.100900e+04
## median                  1.626550e+03               -3.000000e+00
## mean                    2.930314e+03               -3.223214e+00
## SE.mean                 2.561489e+01                3.638894e-01
## CI.mean                 5.020702e+01                7.132558e-01
## var                     1.449047e+07                2.506361e+03
## std.dev                 3.806635e+03                5.006357e+01
## coef.var                1.299054e+00               -1.553219e+01
##          LoanCurrentDaysDelinquent LoanFirstDefaultedCycleNumber
## nbr.val               1.139370e+05                  1.695200e+04
## nbr.null              9.486000e+04                  7.000000e+01
## nbr.na                0.000000e+00                  9.698500e+04
## min                   0.000000e+00                  0.000000e+00
## max                   2.704000e+03                  4.400000e+01
## range                 2.704000e+03                  4.400000e+01
## sum                   1.741146e+07                  2.757830e+05
## median                0.000000e+00                  1.400000e+01
## mean                  1.528165e+02                  1.626846e+01
## SE.mean               1.381503e+00                  6.916981e-02
## CI.mean               2.707725e+00                  1.355800e-01
## var                   2.174546e+05                  8.110620e+01
## std.dev               4.663203e+02                  9.005898e+00
## coef.var              3.051504e+00                  5.535801e-01
##          LoanMonthsSinceOrigination   LoanNumber LoanOriginalAmount
## nbr.val                1.139370e+05 1.139370e+05       1.139370e+05
## nbr.null               1.822000e+03 0.000000e+00       0.000000e+00
## nbr.na                 0.000000e+00 0.000000e+00       0.000000e+00
## min                    0.000000e+00 1.000000e+00       1.000000e+03
## max                    1.000000e+02 1.364860e+05       3.500000e+04
## range                  1.000000e+02 1.364850e+05       3.400000e+04
## sum                    3.634235e+06 7.912295e+09       9.498943e+08
## median                 2.100000e+01 6.859900e+04       6.500000e+03
## mean                   3.189688e+01 6.944447e+04       8.337014e+03
## SE.mean                8.880041e-02 1.153340e+02       1.850358e+01
## CI.mean                1.740475e-01 2.260529e+02       3.626673e+01
## var                    8.984517e+02 1.515582e+09       3.901002e+07
## std.dev                2.997418e+01 3.893048e+04       6.245801e+03
## coef.var               9.397215e-01 5.605987e-01       7.491652e-01
##          LoanOriginationDate LoanOriginationQuarter MemberKey
## nbr.val                   NA                     NA        NA
## nbr.null                  NA                     NA        NA
## nbr.na                    NA                     NA        NA
## min                       NA                     NA        NA
## max                       NA                     NA        NA
## range                     NA                     NA        NA
## sum                       NA                     NA        NA
## median                    NA                     NA        NA
## mean                      NA                     NA        NA
## SE.mean                   NA                     NA        NA
## CI.mean                   NA                     NA        NA
## var                       NA                     NA        NA
## std.dev                   NA                     NA        NA
## coef.var                  NA                     NA        NA
##          MonthlyLoanPayment LP_CustomerPayments
## nbr.val        1.139370e+05        1.139370e+05
## nbr.null       9.350000e+02        6.208000e+03
## nbr.na         0.000000e+00        0.000000e+00
## min            0.000000e+00       -2.349900e+00
## max            2.251510e+03        4.070239e+04
## range          2.251510e+03        4.070474e+04
## sum            3.104507e+07        4.766075e+08
## median         2.177400e+02        2.583830e+03
## mean           2.724758e+02        4.183079e+03
## SE.mean        5.708794e-01        1.419337e+01
## CI.mean        1.118915e+00        2.781878e+01
## var            3.713245e+04        2.295279e+07
## std.dev        1.926978e+02        4.790907e+03
## coef.var       7.072108e-01        1.145306e+00
##          LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## nbr.val                  1.139370e+05       1.139370e+05   1.139370e+05
## nbr.null                 6.308000e+03       6.223000e+03   7.164000e+03
## nbr.na                   0.000000e+00       0.000000e+00   0.000000e+00
## min                      0.000000e+00      -2.349900e+00  -6.648700e+02
## max                      3.500000e+04       1.561703e+04   3.206000e+01
## range                    3.500000e+04       1.561938e+04   6.969300e+02
## sum                      3.538355e+08       1.227720e+08  -6.235275e+06
## median                   1.587500e+03       7.008401e+02  -3.444000e+01
## mean                     3.105537e+03       1.077543e+03  -5.472564e+01
## SE.mean                  1.205623e+01       3.505939e+00   1.797548e-01
## CI.mean                  2.363003e+01       6.871587e+00   3.523166e-01
## var                      1.656106e+07       1.400469e+06   3.681507e+03
## std.dev                  4.069528e+03       1.183414e+03   6.067542e+01
## coef.var                 1.310410e+00       1.098252e+00  -1.108720e+00
##          LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## nbr.val       1.139370e+05          1.139370e+05        1.139370e+05
## nbr.null      1.057710e+05          9.703400e+04        9.722200e+04
## nbr.na        0.000000e+00          0.000000e+00        0.000000e+00
## min          -9.274750e+03         -9.420000e+01       -9.545500e+02
## max           0.000000e+00          2.500000e+04        2.500000e+04
## range         9.274750e+03          2.509420e+04        2.595455e+04
## sum          -1.622770e+06          7.980675e+07        7.763901e+07
## median        0.000000e+00          0.000000e+00        0.000000e+00
## mean         -1.424270e+01          7.004463e+02        6.814205e+02
## SE.mean       3.236089e-01          7.076123e+00        6.983256e+00
## CI.mean       6.342686e-01          1.386909e+01        1.368708e+01
## var           1.193180e+04          5.704998e+06        5.556237e+06
## std.dev       1.092328e+02          2.388514e+03        2.357167e+03
## coef.var     -7.669387e+00          3.409988e+00        3.459196e+00
##          LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## nbr.val                     1.139370e+05  1.139370e+05    1.139370e+05
## nbr.null                    1.106760e+05  0.000000e+00    1.096780e+05
## nbr.na                      0.000000e+00  0.000000e+00    0.000000e+00
## min                         0.000000e+00  7.000000e-01    0.000000e+00
## max                         2.111790e+04  1.012500e+00    3.900000e+01
## range                       2.111790e+04  3.125000e-01    3.900000e+01
## sum                         2.864682e+06  1.137756e+05    5.472000e+03
## median                      0.000000e+00  1.000000e+00    0.000000e+00
## mean                        2.514269e+01  9.985835e-01    4.802654e-02
## SE.mean                     8.166540e-01  5.308565e-05    9.846166e-04
## CI.mean                     1.600629e+00  1.040471e-04    1.929834e-03
## var                         7.598730e+04  3.210843e-04    1.104585e-01
## std.dev                     2.756579e+02  1.791882e-02    3.323530e-01
## coef.var                    1.096374e+01  1.794424e-02    6.920194e+00
##          InvestmentFromFriendsCount InvestmentFromFriendsAmount
## nbr.val                1.139370e+05                1.139370e+05
## nbr.null               1.118060e+05                1.118060e+05
## nbr.na                 0.000000e+00                0.000000e+00
## min                    0.000000e+00                0.000000e+00
## max                    3.300000e+01                2.500000e+04
## range                  3.300000e+01                2.500000e+04
## sum                    2.673000e+03                1.885743e+06
## median                 0.000000e+00                0.000000e+00
## mean                   2.346033e-02                1.655075e+01
## SE.mean                6.885352e-04                8.726094e-01
## CI.mean                1.349518e-03                1.710301e+00
## var                    5.401533e-02                8.675701e+04
## std.dev                2.324120e-01                2.945454e+02
## coef.var               9.906593e+00                1.779650e+01
##             Investors
## nbr.val  1.139370e+05
## nbr.null 0.000000e+00
## nbr.na   0.000000e+00
## min      1.000000e+00
## max      1.189000e+03
## range    1.188000e+03
## sum      9.169106e+06
## median   4.400000e+01
## mean     8.047523e+01
## SE.mean  3.058521e-01
## CI.mean  5.994655e-01
## var      1.065830e+04
## std.dev  1.032390e+02
## coef.var 1.282867e+00

Univariate Plots Section

Analysis of Loan Origination Count by Year

There is a surprising drop in loans in 2009. In searching the news during
that time I found this Prosper News Story.
In October of 2009, the SEC forced Prosper.com, to stop brokering new loans
temporarily while it determined whether Prosper’s loans should be classified
as securities. After a six month quiet period Prosper was reopened to lenders
and borrowers. Prosper made other changes for its business including only
allowing borrowers with a credit score above 640 to request a loan. I added
cohort for before 2010 and after 2010 to see other interesting similarities
and differences.

Summary statistics for Loan Origination Quarter

## Q4 2005 Q1 2006 Q2 2006 Q3 2006 Q4 2006 Q1 2007 Q2 2007 Q3 2007 Q4 2007 
##      22     315    1254    1934    2403    3079    3118    2671    2592 
## Q1 2008 Q2 2008 Q3 2008 Q4 2008 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010 
##    3074    4344    3602     532      13     585    1449    1243    1539 
## Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Q1 2012 Q2 2012 Q3 2012 
##    1270    1600    1744    2478    3093    3913    4435    5061    5632 
## Q4 2012 Q1 2013 Q2 2013 Q3 2013 Q4 2013 Q1 2014 
##    4425    3616    7099    9180   14450   12172

Analysis of Loan Origination Count by Quarter

I’ve added color differences in the graphs for before and after the re-open
in 2010. There is a drop at the beginning of 2013 followed by fairly
consistent growth in number of loans peaking at 14450 in Q4 of 2013.

Analysis of Loan Origination Count by Month

January is the biggest month for new loans, followed by October and December.

Analysis of Loan Origination Count by Day of the Month

The number of loans listed rises over the month and peaks towards the end of
the month. The highest number appear the last day of the month on the 30th.

Summary statistics for Loan Original Amount pre-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2500    4200    6050    7500   25000

Summary statistics for Loan Original Amount post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    7784    9191   14000   35000

Analysis of Loan Original Amount

There is a difference in loan amount before 2010 and after 2010. The minimum
loan amount pre-2010 and post-2010 remained the same at $1000. However, the
maximum loan amount increased from $25000 to $35000.

The mean increased from $6050 to $9190. Post-2010 there are spikes in number
of loans at $4000, $10000, and $15000.

For both pre-2010 and post-2010 data the distribution is skewed to the right
and a small number of loans greater than 25000$.

Summary statistics for Listing Category pre-2010

##      NotAvailable DebtConsolidation   HomeImprovement          Business 
##             16945              5993               813              2088 
##      PersonalLoan        StudentUse              Auto             Other 
##              2395               595               450              1708 
##   BabyAndAdoption              Boat CosmeticProcedure     EngagmentRing 
##                 0                 0                 0                 0 
##        GreenLoans HouseholdExpenses    LargePurchases     MedicalDental 
##                 0                 0                 0                 0 
##        Motorcycle                RV             Taxes          Vacation 
##                 0                 0                 0                 0 
##       WeddingLoan 
##                 0

Summary statistics for Listing Category post-2010

##      NotAvailable DebtConsolidation   HomeImprovement          Business 
##                20             52315              6620              5101 
##      PersonalLoan        StudentUse              Auto             Other 
##                 0               161              2122              8786 
##   BabyAndAdoption              Boat CosmeticProcedure     EngagmentRing 
##               199                85                91               217 
##        GreenLoans HouseholdExpenses    LargePurchases     MedicalDental 
##                59              1996               876              1522 
##        Motorcycle                RV             Taxes          Vacation 
##               304                52               885               768 
##       WeddingLoan 
##               771

Analysis of Listing Category

Pre-2010 the listing category wasn’t very informative. Almost 17000 of the
loans had a listing category of “Not Available”. Post-2010 since there are
only 20 loans where this information is not available.

The debt consolidation category is the largest with a little over 50% of the
loans listed.

Summary statistics for MonthlyLoanPayment pre-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   82.44  149.03  212.06  271.25 1130.90

Summary statistics for MonthlyLoanPayment post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   160.1   256.4   295.0   390.4  2251.5

Analysis of Monthly Loan Payment

The distribution for MonthlyLoanPayment is right skewed - Both pre-2010 and
post-2010 mean values are greater than the median and there is a long tail to
the right. The median payments are $100 higher (256.40) post-2010 than
pre-2010 (149.03).

Analysis of Term

Loans are for 1 year, 3 year or 5 years Terms. Almost 80% of loans are for a
3 year term.

Summary statistics for Lower Credit Score pre-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   600.0   640.0   648.2   700.0   880.0     591

Summary statistics for Lower Credit Score post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   660.0   700.0   699.3   720.0   880.0

Analysis of Lower Credit Score

Here are the Credit Score Ranges as defined by experian, transunion, and equifax:

  • Bad 0 - 550
  • Poor 551 - 649
  • Fair 650 - 699
  • Good 700 - 749
  • Excellent 750 and above

Pre-2010 loans were allowed to be listed with Proper for Borrowers with a
“Bad” credit rating. In addition there were 591 pre-2010 loans listed where
the lower credit score was “Not Available”.

The median rose from 640 pre-2010 to 700 post-2010. For pre-2010 lower credit
score data is skewed slightly to the right since the mean (648.2) is greater
than the median (640). The post-2010 lower credit score data is normally
distributed - The median and mean are very close at 700 and 699 respectively.

Summary statistics for Upper Credit Score pre-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    19.0   619.0   659.0   667.2   719.0   899.0     591

Summary statistics for Upper Credit Score post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   619.0   679.0   719.0   718.3   739.0   899.0

Analysis of Upper Credit Score

Pre-2010 borrowers had a upper credit score median of 659 while post-2010
borrowers had a median upper credit score of 719. The upper credit score
values look normally distributed.

Summary statistics for Credit Grade pre 2010

##  N/A   NC   HR    E    D    C    B    A   AA NA's 
##    0  141 3508 3289 5153 5649 4389 3315 3509 2034

Analysis of Credit Grade

The Credit Grade is the rating that was assigned to pre-2010 loans at the
time the listing went live. I could not determine via the Prosper website
search how the Credit Grade was determined. Credit Grade, from lowest-risk
to highest-risk, are labeled AA, A, B, C, D, E, HR (“High Risk”), and NC
(“No Credit”).

There were 141 loans listed for No Credit “NC” borrowers with 3508 loans
listed for “HR” - High Risk borrowers. It is surprising how many loans were
listed for borrowers in the Credit Grade groups lower than C.

Summary statistics for Prosper Rating post-2010

##   N/A    HR     E     D     C     B     A    AA  NA's 
##     0  6728  9602 13941 17955 15483 14093  5075    73

Analysis of Prosper Rating

Prosper Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D,
E, and HR (“High Risk”). Post-2010 Prosper provides a proprietary “Prosper
Rating” based on the company’s estimation of that borrower’s “estimated loss
rate.”

According to the company, the Prosper Rating is determined by two scores:

  • the credit score, obtained from an official credit reporting agency
  • the Prosper Score, figured in-house based on the Prosper population

Even though Credit Rating and Prosper Rating look similar, I decided not to
combine them for my analysis since there was a decision by the company to no
longer use Credit Grade and switch to Prosper Rating.

Summary statistics for Prosper Score post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   4.000   6.000   5.906   8.000  11.000      73
## 
##     1     2     3     4     5     6     7     8     9    10    11 
##   971  5750  7613 12541  9695 12103 10367 11629  6245  4507  1456

Analysis of Prosper Score

The Prosper Score is a custom risk score for post-2010 loans built using
historical Prosper data. The score ranges from 1-11, with 11 being the best,
or lowest risk score. The Prosper score estimates the probability of a loan
going “bad,” where “bad” is the probability of going 60+ days past due within
the first twelve months from the date of loan origination. Prosper Scorecard Link

The Prosper Score looks normally distributed.

Analysis of Income Range

The distribution of Income Ranges is left skewed with over 50% of the loans
falling in the $25,000 - $75,000 income ranges. Prior to 2010 almost 6% of
loans didn’t include borrower income range information. This does not appear
to be the case post-2010 where loans do contain borrower income range
information.

Summary statistics for Borrower Rate pre-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1250  0.1700  0.1840  0.2375  0.4975

Summary statistics for Borrower Rate post-2010

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.1364  0.1875  0.1960  0.2573  0.3600

Analysis of Borrower Rate

The Borrower Rate is the Borrower’s interest rate for this loan. The median
Borrower Rate for pre-2010 loans was 17% which is lower than the median
18.75% post-2010. The max Borrower Rate for pre-2010 loans was 50% vs 36%
post-2010. The interquartile range pre-2010 is 11.25% where post-2010 the
interquartile range is 12.09%.

It surprised me that Borrowers were paying a higher rate for a Prosper loan
post-2010 compared to pre-2010 since interests rates for things like mortgages
were lower. Link to Historical Mortgage Rates.

Analysis of Loan Status

I defined a factor variable for loans in “GoodStanding”" vs “InDefault” to
group loans for further analysis. I grouped loans as “InDefault” if the loan
status might affect a borrowers credit rating.

Loans in the “InDefault” group:

  • Chargedoff
  • Defaulted
  • Past Due (31-60 days)
  • Past Due (61-90 days)
  • Past Due (91-120 days)
  • Past Due (>120 days)

Pre-2010 the percentage of loans “In Default” was about 35%. Post-2010 about
10% of loans listed were “InDefault”

Univariate Analysis

What is the structure of your dataset?

The prosper loans dataset contains 113937 rows and 81 variables. There are 70
numeric variables and 26 factor variables. The dataset met the criteria for
tidiness as defined
by Wikipedia here.

  • Each variable you measure should be in one column.
  • Each different observation of that variable should be in a different row.
  • There should be one table for each “kind” of variable.
  • If you have multiple tables, they should include a column in the table that
    allows them to be linked.

What is/are the main feature(s) of interest in your dataset?

The main areas I am focused on are how the change in Prosper Loans business
affected their two customer groups - borrowers and lenders.

  • Borrower - What features are of interest and may impact how much I will pay
    to borrow money?
  • Lender - What features are of interest and may impact if the loan will get
    paid back?

  • LoanOriginationDate
  • LoanStatus
  • BorrowerRate
  • CreditGrade
  • ProsperRating
  • ProsperScore
  • IncomeRange
  • CreditScoreRangeLower
  • CreditScoreRangeUpper
  • ListingCategory
  • LoanOriginalAmount

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

  • Term
  • MonthlyLoanPayment
  • IncomeVerifiable
  • DebtToIncomeRatio
  • BankcardUtilization
  • EmploymentStatus

Did you create any new variables from existing variables in the dataset?

Yes I created some new variables namely:

  • OriginationYear
  • OriginationMonth
  • OriginationDay
  • UpperCreditScoreGroup
  • LowerCreditScoreGroup
  • LoanStatusGroup

Each of these variables helped with creating graphs with a different or
labeled groups.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I performed some operations to adjust the data mainly for correctly ordering
factor variables.

  • LoanOriginationQuarter
  • ListingCategory
  • IncomeRange
  • LoanStatus
  • ProsperRating
  • CreditGrade

Bivariate Plots Section

Analysis of Credit Grade and Credit Score Group

This chart shows the relationship between Credit Grade and Upper Credit Score.
There are a surprising number of loans for borrowers either “No Credit”" or
are considered “High Risk”. It also seems odd that all the borrowers in the
“High Risk” are not all in the same category namely “Bad”. That would seem
confusing to a lender to have this group split when the E and D Credit Grades
are much more uniform.

Analysis of Prosper Rating and Credit Score Group

This chart shows the relationship between Prosper Rating and Upper
Credit Score. Post-2010 there are no borrowers with a “Bad” credit score. It
also seems odd that there are borrowers with a “Poor” through “Exception”
Credit Rating in the High Risk category. That would seem very confusing to a
lender to have this mix of credit ratings across the prosper rating groups. It
is very odd to have a “High Risk”" loan for a borrower with an “Exceptional”
credit rating.

Even with the mix of Credit Ratings within a Prosper Rating, there is a clear
increase in better credit scores as the prosper rating values increase.

Analysis of Borrower Rate vs Credit Grade and Prosper Rating

These two graphs show the relationship between Borrower Rating and the
pre-2010 Credit Grade and post-2010 Prosper Rating. The Borrower Rate for
Credit Grades “NC” through “E” were between 15% - 28%. For Credit Grades D,
C, B, A, and AA there is downward stair step.

The Borrower Rate for Prosper Ratings post-2010 “HR” have a median value above
30% which is higher. There is a clear downward stair step towards “AA” with
the Borrower Rate median and interquartile range below 10%.

Analysis of Loan Status vs Prosper Rating, Credit Grade and Prosper Score

These three graphs compare the loan status to each of the three variables
measuring credit quality and risk. The Credit Grade graph shows that the 35%
of defaults in loans pre-2010 occured across all the Credit Grades. I was
surprised to see about 2% of loans in AA in default and made me question the
validity of the rating. As I mentioned earlier in the report, I could not find
details on how the Credit Grade was determined.

The Prosper Rating graph shows a decrease in loans “InDefault”" as the Prosper
Rating increases. This is more in line with what I would expect for a borrower
loan rating.

The Prosper Score is an internal scorecard and estimates the probability of a
loan going “bad,” but only looks at the possibility of a loan going bad within
the first year of the loan. In the univariate analysis, we saw that almost
80% of the loans are for three years. As we can see in the chart above, loans
with a higher prosper score six or above also had about a 1% default rate. If
prosper updated thier model to look at longer timelines, this score could be
more accurate and reduce the post-2010 10% default rate.

Analysis of Loan Original Amount vs Prosper Rating and Credit Grade

I used a scatter plot with an alpha of 1/4 to plot Loan Original Amounts vs
the Credit Grade and Prosper Rating. The graph of Credit Grade shows a larger
number of defaults for loan amounts greater than $5000 for Credit Grades B and
lower. The higher loan amounts even across credit grades are more likely to
be “InDefault” as is shown by the predominantly blue color.

The Prosper Rating graph shows the much lower rates of default as we saw
earlier. It also shows lower Loan Amounts for borrowers in the High Risk
group post-2010 compared to pre-2010. Post-2010 shows much more restraint and
consistency when loaning money to borrowers with lower ratings.

Analysis of Monthly Loan Payment vs Prosper Rating and Credit Grade

For both plots, I only looked at Loans for $10000 to be able to compare Monthly
Loan Payments pre-2010 and post-2010. I took the sqrt of the Monthly Loan
Payment to adjust for the skewness in the data. The plot for Monthly Loan
Payments post-2010 shows that interquartile range for payments across all
Prosper Ratings are below $500. For pre-2010 loans there it was interesting
to see that interquartile range that are in default were all above \(500. We \ can see from these charts that a monthly loan payment below 500\) has an impact
on keeping the loan in good standing.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

I summarized my findings after each plot or group of plot above.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

The Monthly Loan Payment analysis was interesting. Having a payment amount
that the borrower can make each month has a big influence on keeping the loan
in good standing.

What was the strongest relationship you found?

The strongest relationship I found was BorrowerRate and ProsperRating. It
was interesting because if a borrower wanted to reduce the rate they paid it
wasn’t as transparent to figure out what actions they could take. The Prosper
Rating is composed of the Credit Score and the Prosper Score. If they want to
increase thier credit score they can call the credit service and see details.
For the Prosper Rating or Prosper Score it isn’t as clear how those are
calculated and they affect the rate the borrower pays.

Multivariate Plots Section

## 
## Calls:
## Model 1 : lm(formula = BorrowerRate ~ Term, data = subset_vars)
## Model 2 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper, data = subset_vars)
## Model 3 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio, 
##     data = subset_vars)
## Model 4 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio + 
##     ProsperScore, data = subset_vars)
## Model 5 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio + 
##     ProsperScore, data = subset_vars)
## Model 6 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio + 
##     ProsperScore + ProsperRating..numeric., data = subset_vars)
## 
## ================================================================================================================
##                              Model 1       Model 2       Model 3       Model 4       Model 5       Model 6      
## ----------------------------------------------------------------------------------------------------------------
##   (Intercept)                  0.196***      0.764***      0.766***      0.635***      0.635***       0.328***  
##                               (0.001)       (0.003)       (0.004)       (0.003)       (0.003)        (0.001)    
##   Term                        -0.000         0.000***      0.000***      0.000***      0.000***       0.000***  
##                               (0.000)       (0.000)       (0.000)       (0.000)       (0.000)        (0.000)    
##   CreditScoreRangeUpper                     -0.001***     -0.001***     -0.000***     -0.000***       0.000***  
##                                             (0.000)       (0.000)       (0.000)       (0.000)        (0.000)    
##   DebtToIncomeRatio                                        0.028***      0.010***      0.010***      -0.001**   
##                                                           (0.001)       (0.001)       (0.001)        (0.000)    
##   ProsperScore                                                          -0.017***     -0.017***       0.001***  
##                                                                         (0.000)       (0.000)        (0.000)    
##   ProsperRating..numeric.                                                                            -0.045***  
##                                                                                                      (0.000)    
## ----------------------------------------------------------------------------------------------------------------
##   sigma                        0.074         0.064         0.062         0.051         0.051          0.021     
##   R-squared                    0.000         0.253         0.289         0.526         0.526          0.920     
##   F                            0.015     14032.034     10290.557     20992.080     20992.080     174124.310     
##   p                            0.901         0.000         0.000         0.000         0.000          0.000     
##   N                        82877         82877         75772         75772         75772          75772         
## ================================================================================================================

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

I focused my analysis in this section on looking at how to increase the
ability for borrowers to keep thier loan in good standing.

The first two plots looked at MontlyLoanPayment vs Term by LoanStatus for
borrowers with a ProsperScore below 6. I thought about looking at this as
if I were a data analyst for Prosper - There might be an opportunity for
Prosper to increase the Term or provide more flexibility on the Term (4 year
loans?) to reduce the monthly payment amount to < $250 and increase the
ability for the borrower to make the lower payments.

The second two plots looked at MonthlyLoanPayment vs Loan Original Amount by
Loan Status for borrowers with a ProsperCore below 6. Again I thought about
looking at this as if I were a data analyst for Prosper - There might be an
opportunity to refine the Loan Consolidation model to keep the payment
amounts lower.

Were there any interesting or surprising interactions between features?

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

I created a model for BorrowerRate to better understand how it was calculated.
I looked at Term, CreditScoreRangeUpper, DebtToIncomeRatio, ProsperScore, and
ProsperRating..numeric. The correlation between BorrowerRate and
ProsperRating..numeric is strong at .92 R-squared. It was suprising the
correlation for Credit Score was so low at .29. It would be a challenge as a
borrower to figure out how to get a better rate on the Prosper platform
without increased transparency into how those metrics are calculated.

The models I put together are simplistic at this point compared to the number
of variables associated with peer to peer lending but still insightful. After
I get more experience with model development beyond linear regression I would
like to revisit this analysis.


Final Plots and Summary

Plot One

Description One

I chose this plot because it showed a difference between the Credit Score and
the Prosper Rating. It was suprising that a “Poor” credit score would end
appear in the higher Prosper Ratings.

Plot Two

Description Two

For this plot I looked at loans before 2010. I took the sqrt of the Monthly
Loan Payment to adjust for the skewness in the data. It was interesting to
see the medians montly payment for loans in good standing and loans in default
drift as the monthly payment rose. From this chart your can see that a lower
monthly payment has an impact on keeping the loan in good standing.

Plot Three

Description Three

For this plot I looked at loans originated after 2010. The graph shows how
currently the monthly payment is correlated to the Loan Amount rather than how
if a borrower can consistently make monthly payments. (The correlation
between Montly Loan Payment and Loan Original Amount when calculated with
cor.test was .91.) As loan amounts increase, monthly payments increase.

Prosper already made an adjustment in 2010 to keep payments lower. It would
be intesting for them to model if adding more options for Term and Monthly
payment amounts would futher reduce the post-2010 10% default rate.

========================================================

Reflection

For this exploratory data analysis I decided to take on one of the larger
data sets. With 81 variables to explore, it was easy to get off track and
look into each of the variables. It took a bit to figure out which variables
were important and I was afraid to miss an insight. I did alot of plots as a
result and many of them are not included because they not that useful. I
decided to keep my focus on questions I thought Prosper’s customers would
care about.

I was very surprised at the Peer to Peer business environment pre-2010 that
led to action from the SEC. I found it incredibly helpful to have domain
knowledge for Peer to Peer Lending and overview of the business - both
provided much needed context to what I was seeing in the data.

The interesting models I would like to predict are more complex and outside
the scope of EDA. For example the Prosper Score is an internal scorecard and
estimates the probability of a loan going “bad,” where “bad” is the
probability of going 60+ days past due within the first twelve months from
the date of loan origination. This isn’t helpful for loans that are 3 year or
5 year in term. The lenders would probably like a better score to predict if
it is likely that a loan will get paid off. It would also be good to know if
focusing on lower monthly payments would help keep the loans in good standing.
The loans are not secured, Prosper has discontinued use of a secondary market,
and the lender is bearing the risk.

What did go well is my EDA skills have improved. I also have a much better
understanding of when to apply a transform to a variable and what plots to use
and in what circumstance. The example EDA projects were inspiring and I was
able to use a few tips. I am excited to take on machine learning. I would
like to develop a model that attempt to forecast what type of loans are likely
to stay in good standing as a future project.